Determining the optimal ethanol/water ratio to maximize solubility in ethanol/water cosolvent systems
Researchers: Andrew SID Lang, Enrique F Valderrama Araya, and Bryant G Wilson
Introduction
Predicting the solubility of a compound in the ethanol/water cosolvent system is important for certain organic syntheses and associated recrystallization procedures as well as drug delivery techniques for drugs that have poor aqueous solubilities. Here, building on the work by Abraham and Acree [1], we present models that can be used to find the optimal ethanol/water volume fractions to optimize solubilities.
Method
Abraham solvent coefficients, polynomial fit
Using the data from[ref] for the standard Abraham solvent coefficients
[ref2], different polynomial orders where calculated using the statistical software R[ref?] and using Maxima-CAS[ref]. Were observed for the best fit all agree reasonably well to the same polynomial order poly(3), with the exception for the solvent coefficient, "e", which a polynomial of order 5 was need to fit it properly. The statistics variation on the reference data[ref2] in some cases R^2 is only of 0.75, which could justify the oscillation of the data and the poor fit for poly(3) to the solvent for "e". However more interesting is to be able to fit with a polynomial of 3rd order all the solvent coefficients equations, and as well what are the implications of this. We need to investigate the physics which make it more agree with poly(3) or even poly(5), but not for even order of polynomial fits.
The coefficients for the fit for each Abraham solvent coefficient are:
The polynomial fit of order 5
present better fit, but we need to add 2 more parameters per curve, a total of 12 new parameters when the solubility is calculated. The plot of the fit by polynomial of order 5 can be seen it below.
and the coefficients forcing to pass by zero are:
The polynomial of order 4 with the coefficient of order zero floating
The polynomial of order 4 with the coefficient of order zero force to be zero
Comparison for different fits of the coefficients forcing to pass by zero
Interactive section:
{
c(x)=0.004 -2.276x+5.457x^2-2.955x^3 click here to plot and play with c(x)
e(x)=-0.047 +1.073x-2.536x^2+1.935x^3
s(x)=0.042 -0.905x+1.757x^2-1.851x^3
a(x)=0.001 +0.558x-0.147x^2-0.113x^3
b(x)=0.037 -4.244x-1.791x^2+2.428x^3
v(x)=-0.037 +5.039x+0.135x^2-1.346x^3
}
Rcode:
This code will generate the coefficients of fits on a csv file R-fits , which will be use later on opensource software maxima
/* [wxMaxima batch file version 1] [ DO NOT EDIT BY HAND! ]*/
/* [ Created with wxMaxima version 13.04.2 ] */
/* [wxMaxima: input start ] */
/* coefficients file name */
cfn: R011515;
/*polynomial order */
PO: 3;
/* 2 column csv with first the 4 coefficients then R^2 and R^2adj for each */
m: read_matrix("C:/R/fit_poly3_011515_.csv", 'csv);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
for i:0 thru PO step 1 do c[i]:m[i+1,2]$
for i:0 thru PO step 1 do ldisplay(c[i]);
for i:0 thru PO step 1 do e[i]:m[i+1+(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(e[i]);
for i:0 thru PO step 1 do s[i]:m[i+1+2*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(s[i]);
for i:0 thru PO step 1 do a[i]:m[i+1+3*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(a[i]);
for i:0 thru PO step 1 do b[i]:m[i+1+4*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(b[i]);
for i:0 thru PO step 1 do v[i]:m[i+1+5*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(v[i]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
c(x):=sum (c[i]*x^i, i, 0, PO);
e(x):=sum (e[i]*x^i, i, 0, PO);
s(x):=sum (s[i]*x^i, i, 0, PO);
a(x):=sum (a[i]*x^i, i, 0, PO);
b(x):=sum (b[i]*x^i, i, 0, PO);
v(x):=sum (v[i]*x^i, i, 0, PO);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
mraw: read_matrix(concat("C:/R/",string(cfn),".csv"), 'csv);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xc: [
[mraw[2,1],mraw[2,2]],
[mraw[3,1],mraw[3,2]],
[mraw[4,1],mraw[4,2]],
[mraw[5,1],mraw[5,2]],
[mraw[6,1],mraw[6,2]],
[mraw[7,1],mraw[7,2]],
[mraw[8,1],mraw[8,2]],
[mraw[9,1],mraw[9,2]],
[mraw[10,1],mraw[10,2]],
[mraw[11,1],mraw[11,2]],
[mraw[12,1],mraw[12,2]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xc], c(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "c(x): exp. data", "c(x): poly(3) fit"],
/*[label,["R^2=0.998038422",0.66,-0.2] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/c",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xe: [
[mraw[2,1],mraw[2,3]],
[mraw[3,1],mraw[3,3]],
[mraw[4,1],mraw[4,3]],
[mraw[5,1],mraw[5,3]],
[mraw[6,1],mraw[6,3]],
[mraw[7,1],mraw[7,3]],
[mraw[8,1],mraw[8,3]],
[mraw[9,1],mraw[9,3]],
[mraw[10,1],mraw[10,3]],
[mraw[11,1],mraw[11,3]],
[mraw[12,1],mraw[12,3]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xe], e(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "e(x): exp. data", "e(x): poly(3) fit"],
/*[label,["R^2=0.944630971",0.66,0.4] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/e",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xs: [
[mraw[2,1],mraw[2,4]],
[mraw[3,1],mraw[3,4]],
[mraw[4,1],mraw[4,4]],
[mraw[5,1],mraw[5,4]],
[mraw[6,1],mraw[6,4]],
[mraw[7,1],mraw[7,4]],
[mraw[8,1],mraw[8,4]],
[mraw[9,1],mraw[9,4]],
[mraw[10,1],mraw[10,4]],
[mraw[11,1],mraw[11,4]],
[mraw[12,1],mraw[12,4]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xs], s(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "s(x): exp. data", "s(x): poly(3) fit"],
/*[label,["R^2=0.981372471",0.66,-1] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file,concat("C:/R/s",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xa: [
[mraw[2,1],mraw[2,5]],
[mraw[3,1],mraw[3,5]],
[mraw[4,1],mraw[4,5]],
[mraw[5,1],mraw[5,5]],
[mraw[6,1],mraw[6,5]],
[mraw[7,1],mraw[7,5]],
[mraw[8,1],mraw[8,5]],
[mraw[9,1],mraw[9,5]],
[mraw[10,1],mraw[10,5]],
[mraw[11,1],mraw[11,5]],
[mraw[12,1],mraw[12,5]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xa], a(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "a(x): exp. data", "a(x): poly(3) fit"],
/*[label,["R^2=0.986234862",0.66,0.05] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/a",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xb: [
[mraw[2,1],mraw[2,6]],
[mraw[3,1],mraw[3,6]],
[mraw[4,1],mraw[4,6]],
[mraw[5,1],mraw[5,6]],
[mraw[6,1],mraw[6,6]],
[mraw[7,1],mraw[7,6]],
[mraw[8,1],mraw[8,6]],
[mraw[9,1],mraw[9,6]],
[mraw[10,1],mraw[10,6]],
[mraw[11,1],mraw[11,6]],
[mraw[12,1],mraw[12,6]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xb], b(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "b(x): exp. data", "b(x): poly(3) fit"],
/*[label,["R^2=0.999726899",0.66,-0.7] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/b",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
xv: [
[mraw[2,1],mraw[2,7]],
[mraw[3,1],mraw[3,7]],
[mraw[4,1],mraw[4,7]],
[mraw[5,1],mraw[5,7]],
[mraw[6,1],mraw[6,7]],
[mraw[7,1],mraw[7,7]],
[mraw[8,1],mraw[8,7]],
[mraw[9,1],mraw[9,7]],
[mraw[10,1],mraw[10,7]],
[mraw[11,1],mraw[11,7]],
[mraw[12,1],mraw[12,7]]
];
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
plot2d([[discrete, xv], v(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "v(x): exp. data", "v(x): poly(3) fit "],
/*[label,["R^2=0.999529421",0.66,0.5]],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/v",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input end ] */
/* [wxMaxima: input start ] */
c(x);
e(x);
s(x);
a(x);
b(x);
v(x);
/* [wxMaxima: input end ] */
/* Maxima can't load/batch files which end with a comment! */
"Created with wxMaxima"$
Results
Maximum Solubility
Using the equation log_10 P = c+eE+sS+aA+bB+vV, the maximum solubility of compounds with known Abraham descriptors in any water-ethanol system can be predicted. The following code uses Python 2.7.6 to calculate maximum and minimum solubilities for a list of compounds;it also produces plots for each compound. The program uses sympy to find where the derivative of equation equals 0 and then substitutes those values into the original equation to find the max/min. It requires non-standard libraries including joblib, numpy, sympy, and matplotlib. The //x**2is required for sympy// to properly parse exponents. The program uses the descriptors file with ";" as delimiters. [[file:Compounds is an updated version of ONSAD001 which includes the max/min solubilities for each compound using the 3rd order polynomial; it does not include duplicates or any compounds with unknown Abraham descriptors.
The following calculates the max/min of the compounds in ONSAD001 using 3rd, 4th, or 5th order polyfit. It also creates a graph of each compound and it's solubility. At the end it will create a historgrams showing the frequency of the maximums and then the minimums.
from__future__import print_function
import joblib
from joblib import Parallel, delayed
import matplotlib.pyplotas plt
import matplotlib.mlabas mlab
import numpy as np
from sympy import *
importosfrommathimport exp
polyfits=['5']#for now it can only run one item at a time (e.g. '3'). If I can figure out why the program freezes my computer when there is more than one item, then it could run multiple polyfits at in sequence.for polyfit in polyfits:
err=False
Hash={}
location={}#This is to change all the output files for appropriate fit.if polyfit=='3':
cx='(-2.954445751*x**3+5.454132888*x**2-2.265884303*x)'
ex='(1.622408565*x**3-1.954950801*x**2+0.753630794*x)'
sx='(-1.550023723*x**3+1.216033377*x**2-0.613513212*x)'
ax='(-0.166034064*x**3-0.081092064*x**2+0.542524049*x)'
bx='(2.695058138*x**3-2.281640778*x**2-3.977643487*x)'
vx='(-1.641156287*x**3+0.649768963*x**2+4.775062366*x)'print('doing 3')if polyfit=='4':
cx='(-0.81158*x**4-1.3744*x**3+4.5172*x**2-2.1039*x)'
ex='(5.9798*x**4-10.02*x**3+4.9483*x**2-0.43968*x)'
sx='(-7.7073*x**4+13.455*x**3-7.6815*x**2+0.92454*x)'
ax='(1.9719*x**4-4.0051*x**3+2.1953*x**2+0.14902*x)'
bx='(-5.1194*x**4+12.662*x**3-8.1915*x**2-2.956*x)'
vx='(7.2178*x**4-15.693*x**3+8.9822*x**2+3.3347*x)'print('doing 4')if polyfit=='5':
cx='(-3.1921*x**5+7.0956*x**4-8.1733*x**3+6.8559*x**2-2.3628*x)'
ex='(-3.2886*x**5+14.126*x**4-17.024*x**3+7.3576*x**2-0.70635*x)'
sx='(-16.138*x**5+32.268*x**4-20.917*x**3+4.1418*x**2-0.38408*x)'
ax='(13.292*x**5-30.953*x**4+24.305*x**3-7.5426*x**2+1.2268*x)'
bx='(1.8013*x**5-9.5814*x**4+16.499*x**3-9.5113*x**2-2.81*x)'
vx='(15.248*x**5-30.554*x**4+16.784*x**3-2.1894*x**2+4.5712*x)'print('doing 5')def replace(text,Echo,Sierra,Austin,Bravo,Victor):
'''Replaces a Hash of characters'''
dic={'cx':cx,'ex':ex,'sx':sx,'ax':ax,'bx':bx,'vx':vx,'Echo':Echo,'Sierra':Sierra,'Bravo':Bravo,'Austin':Austin,'Victor':Victor}for i, j in dic.iteritems():
text = text.replace(i, j)return text
withopen('./ONSAD001.csv','r')as f:
newfile=open('./newfile'+polyfit,'w')for descriptors in f:
descriptors=descriptors.strip('\n').strip('\'')
descriptors=descriptors.split(';')if descriptors[0]=='key':
print('True')print(descriptors[0]+';'+descriptors[1]+';'+descriptors[2]+';'+descriptors[3]+';'+descriptors[4]+';'+descriptors[5]+';'+descriptors[6]+
';'+descriptors[7]+';'+descriptors[8]+';'+descriptors[9]+';'+descriptors[10]+';'+descriptors[11]+';'+descriptors[12]+
';'+descriptors[13]+';'+descriptors[14]+';'+' Max Solubilty [0,1]; Min Solubility[0,1]; root[0]; root[1]; root[2]; root[3]',file=newfile)else:
if descriptors[7]=='-123'or descriptors[8]=='-123'or descriptors[9]=='-123'or descriptors[10]=='-123'or descriptors[11]=='-123':
passelse:
if descriptors[4]in Hash: #descriptors[4] is the CSID
value=Hash.get(descriptors[4])#get the date from the CSIDif value[14]<descriptors[14]: #descriptors[14] is the current compound. If the old date is less then just skipp it
Hash.update({descriptors[4]:descriptors})else:
passelse:
Hash.update({descriptors[4]:descriptors})
newfile.close()
newfile=open('./newfile'+polyfit,'a')def run(csid):
filterroots=[]
OoR=[]
newfile=open('./newfile'+polyfit,'a')
predictions=[]
values={}#print(descriptors)
line=Hash.get(csid)#print(csid)#print(line)
compound=line[1]
Echo=line[7]#9-13
Sierra=line[8]
Austin=line[9]
Bravo=line[10]
Victor=line[11]
equation='cx+ex*Echo+sx*Sierra+ax*Austin+bx*Bravo+vx*Victor'
equation=replace(equation,Echo,Sierra,Austin,Bravo,Victor)
x=Symbol('x')
Alpha=0
Omega=1
y=sympify(equation)
y=simplify(y)#print(equation)#print(y)
yprime=y.diff(x)#print(yprime)
roots=solve(yprime,x)
OoR.append(roots[0])
OoR.append(roots[1])for item in roots:
guess=complex(item).realif guess <=1and guess >=0:
filterroots.append(complex(item).real)try:
values.update({filterroots[0]:y.subs(x,filterroots[0])})
values.update({filterroots[1]:y.subs(x,filterroots[1])})
values.update({filterroots[2]:y.subs(x,filterroots[2])})
values.update({filterroots[3]:y.subs(x,filterroots[3])})except:
pass
values.update({Alpha:y.subs(x,Alpha)})
values.update({Omega:y.subs(x,Omega)})
maximum=max(values.items(), key=lambda x: x[1])[0]
minimum=min(values.items(), key=lambda x: x[1])[0]print(line[0],';',line[1],';',line[2],';',line[3],';',line[4],';',line[5],';',line[6],';',line[7],';',line[8],';',line[9],';',line[10],';',line[11],';',line[12],';',line[13],';',line[14],';',maximum,';',minimum,';',OoR[0],';',OoR[1],';','4',';','5',file=newfile)
newfile.close()
out='./Plots/compounds'+polyfit+'/'+str(csid)+'.png'for item in np.arange(0,1,.01):
predictions.append(y.subs(x,item))
xp=np.linspace(0,1,100)
_=plt.plot(xp,predictions)
plt.ylim(float(min(predictions)*1.1),float(max(predictions)*1.1))
plt.ylabel('Log P')
plt.xlabel('Vol')
plt.title(compound)
plt.gca().tick_params(axis='x', pad=15)try:
plt.savefig(out)exceptOSError,e:
os.mkdirs('./Plots')
plt.savfig(out)print('created Plots dir')
plt.close()try:
parallel=Parallel(n_jobs=-1,verbose=1000)(delayed(run)(csid)for csid in Hash)except:
pass
do=['MAX','MIN']# this will run for 'MAX' and then 'MIN'.for item in do:
numbers=[]withopen('./newfile'+polyfit,'r')as f:
print(item)try:
for line in f:
line=line.split(';')try:
if item =='MAX':
float(line[15])<2
numbers.append(float(line[15].strip('\n')))if item =='MIN':
float(line[16])<2
numbers.append(float(line[16].strip('\n')))except:
passexcept:
pass
out='./'+item+'histogram'+polyfit+'.png'
plt.xlabel('Vol')print('passed')
plt.hist(numbers,bins=25)
plt.gca().tick_params(axis='x', pad=15)
plt.savefig(out)
plt.show()
The maximum solubilities were calculated for Hydrocotisone (0.761910624963585), Benzamide (0.805242993382671), Valdecoxib (0.906636072114208), Antipyrine (0.655966697948519), Aminopyrine (0.847692071360370) using the cubic polynomials. A complete list of the compounds can be found in the previously mentioned Compounds file.
Riboflavin is interesting because its maximum occurs at .2.
2212 compounds were used in making the following histogram. The histogram is not very surprising, but it quickly shows that most of the compounds have maximum solubilities near 1. There were approximately 150 compounds that have maximum solubilities of 0. The histogram on the left includes all compounds in 100 bins, and the histogram on the right only includes compounds in 25 bins.
4th Order Polynomial Fit
5th Order Polynomial Fit
The 5th order polynomials fit produces similar results, but the histogram creating by rerunning the histogram is somewhat interesting because the slight curve in the 3rd order is gone and more compounds have maximums near 1 . The histogram can be reproduced by running the parallelized.py first, and then histogram.py.
The following links contain all the plots of the 3rd, 4th, and 5th order polynomials. On a technical note, they are .tar.gz; so they are compressed and archived. Some older versions of 7-Zip or WinRAR require that you extract them twice - first to the .tar file and then to the unarchived folder.
Differences between poly(3), poly(4), and poly(5) fits for the floating order zero coefficient and when we force the order zero coefficient to be zero.
The following plots shows the histograms of where in the volume ratio scale we have maximum solubility for ~ 2000 compounds. This plots was obtained by the python code compiled on a windows machine using Spyder for interface. First row is for floating, second row is for force zero. The columns increase with the polynomial order.
Histograms of maximums and minimums for drug solubility versus polynomial fit
Comparison of Drug solubility maximums position versus polynomial fit
Can be observed on the following graph, the difference on prediction for maximum solubility as different polynomial order of approximation of the experimental data was used on the model. This difference on prediction can be as high of 15% for caffeine and less than 5% for 1,4 Dioxane, Rifapentine, Benzamide and Adenin. Negligible difference for Ammonia, Betulin, Hydrocortisone and Cimetidine.
Comparison with the Jouyban-Acree Model
The Jouyban-Acree model can be used to approximate the solubility of a compound in the water-ethanol system at various temperatures:
where x is the volume fraction of ethanol and T is the temperature in degrees Kelvin. Using:
We obtain...
Expanding in terms of x gives:
Comparison with our model, where in this case we have let c= 0:
We see that the Jouyban-Acree model restricted to T=298.15 K ....
The coefficients are function of the temperature, but they are not very sensitive, assuming the coefficients for logP_E stay the same as we increase the temperature, which I doubt that is the case. :P
T=198 K
T=298 K
T=398 K
References
1. Abraham MH and Acree Jr. WE: Partition coefficients and solubilities of compounds in the water-ethanol system. J Solution Chem. doi: 10.1007/s10953-011-9719-x
Determining the optimal ethanol/water ratio to maximize solubility in ethanol/water cosolvent systems
Researchers: Andrew SID Lang, Enrique F Valderrama Araya, and Bryant G WilsonIntroduction
Predicting the solubility of a compound in the ethanol/water cosolvent system is important for certain organic syntheses and associated recrystallization procedures as well as drug delivery techniques for drugs that have poor aqueous solubilities. Here, building on the work by Abraham and Acree [1], we present models that can be used to find the optimal ethanol/water volume fractions to optimize solubilities.Method
Abraham solvent coefficients, polynomial fit
Using the data from[ref] for the standard Abraham solvent coefficientsThe coefficients for the fit for each Abraham solvent coefficient are:
The polynomial fit of order 5
present better fit, but we need to add 2 more parameters per curve, a total of 12 new parameters when the solubility is calculated. The plot of the fit by polynomial of order 5 can be seen it below.and the coefficients forcing to pass by zero are:
The polynomial of order 4 with the coefficient of order zero floating
The polynomial of order 4 with the coefficient of order zero force to be zero
Comparison for different fits of the coefficients forcing to pass by zero
Interactive section:
{c(x)=0.004 -2.276x+5.457x^2-2.955x^3 click here to plot and play with c(x)
e(x)=-0.047 +1.073x-2.536x^2+1.935x^3
s(x)=0.042 -0.905x+1.757x^2-1.851x^3
a(x)=0.001 +0.558x-0.147x^2-0.113x^3
b(x)=0.037 -4.244x-1.791x^2+2.428x^3
v(x)=-0.037 +5.039x+0.135x^2-1.346x^3
}
Rcode:
This code will generate the coefficients of fits on a csv file R-fits , which will be use later on opensource software maximasetwd("C:/R") mydata = read.csv(file="R011515.csv",head=TRUE) fitc <- lm(c ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fite <- lm(e ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fits <- lm(s ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fita <- lm(a ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fitb <- lm(b ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fitv <- lm(v ~ 0+ poly(X, 3, raw=TRUE),data=mydata) summary(fitc) str(summary(fitc)) # write.table(summary(fitc)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fitc[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitc)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitc)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") # write.table(summary(fite)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fite[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fite)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fite)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") # write.table(summary(fits)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fits[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fits)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fits)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") # write.table(summary(fita)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fita[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fita)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fita)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") # write.table(summary(fitb)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fitb[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitb)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitb)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") # write.table(summary(fitv)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(fitv[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitv)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitv)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")Maxima code:
Need it files: Rawdata and the coefficients calculated by R :coefficients of fits/* [wxMaxima batch file version 1] [ DO NOT EDIT BY HAND! ]*/ /* [ Created with wxMaxima version 13.04.2 ] */ /* [wxMaxima: input start ] */ /* coefficients file name */ cfn: R011515; /*polynomial order */ PO: 3; /* 2 column csv with first the 4 coefficients then R^2 and R^2adj for each */ m: read_matrix("C:/R/fit_poly3_011515_.csv", 'csv); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ for i:0 thru PO step 1 do c[i]:m[i+1,2]$ for i:0 thru PO step 1 do ldisplay(c[i]); for i:0 thru PO step 1 do e[i]:m[i+1+(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(e[i]); for i:0 thru PO step 1 do s[i]:m[i+1+2*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(s[i]); for i:0 thru PO step 1 do a[i]:m[i+1+3*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(a[i]); for i:0 thru PO step 1 do b[i]:m[i+1+4*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(b[i]); for i:0 thru PO step 1 do v[i]:m[i+1+5*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(v[i]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ c(x):=sum (c[i]*x^i, i, 0, PO); e(x):=sum (e[i]*x^i, i, 0, PO); s(x):=sum (s[i]*x^i, i, 0, PO); a(x):=sum (a[i]*x^i, i, 0, PO); b(x):=sum (b[i]*x^i, i, 0, PO); v(x):=sum (v[i]*x^i, i, 0, PO); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ mraw: read_matrix(concat("C:/R/",string(cfn),".csv"), 'csv); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xc: [ [mraw[2,1],mraw[2,2]], [mraw[3,1],mraw[3,2]], [mraw[4,1],mraw[4,2]], [mraw[5,1],mraw[5,2]], [mraw[6,1],mraw[6,2]], [mraw[7,1],mraw[7,2]], [mraw[8,1],mraw[8,2]], [mraw[9,1],mraw[9,2]], [mraw[10,1],mraw[10,2]], [mraw[11,1],mraw[11,2]], [mraw[12,1],mraw[12,2]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xc], c(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "c(x): exp. data", "c(x): poly(3) fit"], /*[label,["R^2=0.998038422",0.66,-0.2] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/c",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xe: [ [mraw[2,1],mraw[2,3]], [mraw[3,1],mraw[3,3]], [mraw[4,1],mraw[4,3]], [mraw[5,1],mraw[5,3]], [mraw[6,1],mraw[6,3]], [mraw[7,1],mraw[7,3]], [mraw[8,1],mraw[8,3]], [mraw[9,1],mraw[9,3]], [mraw[10,1],mraw[10,3]], [mraw[11,1],mraw[11,3]], [mraw[12,1],mraw[12,3]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xe], e(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "e(x): exp. data", "e(x): poly(3) fit"], /*[label,["R^2=0.944630971",0.66,0.4] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/e",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xs: [ [mraw[2,1],mraw[2,4]], [mraw[3,1],mraw[3,4]], [mraw[4,1],mraw[4,4]], [mraw[5,1],mraw[5,4]], [mraw[6,1],mraw[6,4]], [mraw[7,1],mraw[7,4]], [mraw[8,1],mraw[8,4]], [mraw[9,1],mraw[9,4]], [mraw[10,1],mraw[10,4]], [mraw[11,1],mraw[11,4]], [mraw[12,1],mraw[12,4]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xs], s(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "s(x): exp. data", "s(x): poly(3) fit"], /*[label,["R^2=0.981372471",0.66,-1] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file,concat("C:/R/s",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xa: [ [mraw[2,1],mraw[2,5]], [mraw[3,1],mraw[3,5]], [mraw[4,1],mraw[4,5]], [mraw[5,1],mraw[5,5]], [mraw[6,1],mraw[6,5]], [mraw[7,1],mraw[7,5]], [mraw[8,1],mraw[8,5]], [mraw[9,1],mraw[9,5]], [mraw[10,1],mraw[10,5]], [mraw[11,1],mraw[11,5]], [mraw[12,1],mraw[12,5]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xa], a(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "a(x): exp. data", "a(x): poly(3) fit"], /*[label,["R^2=0.986234862",0.66,0.05] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/a",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xb: [ [mraw[2,1],mraw[2,6]], [mraw[3,1],mraw[3,6]], [mraw[4,1],mraw[4,6]], [mraw[5,1],mraw[5,6]], [mraw[6,1],mraw[6,6]], [mraw[7,1],mraw[7,6]], [mraw[8,1],mraw[8,6]], [mraw[9,1],mraw[9,6]], [mraw[10,1],mraw[10,6]], [mraw[11,1],mraw[11,6]], [mraw[12,1],mraw[12,6]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xb], b(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "b(x): exp. data", "b(x): poly(3) fit"], /*[label,["R^2=0.999726899",0.66,-0.7] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/b",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ xv: [ [mraw[2,1],mraw[2,7]], [mraw[3,1],mraw[3,7]], [mraw[4,1],mraw[4,7]], [mraw[5,1],mraw[5,7]], [mraw[6,1],mraw[6,7]], [mraw[7,1],mraw[7,7]], [mraw[8,1],mraw[8,7]], [mraw[9,1],mraw[9,7]], [mraw[10,1],mraw[10,7]], [mraw[11,1],mraw[11,7]], [mraw[12,1],mraw[12,7]] ]; /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ plot2d([[discrete, xv], v(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]], [legend, "v(x): exp. data", "v(x): poly(3) fit "], /*[label,["R^2=0.999529421",0.66,0.5]],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/v",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input end ] */ /* [wxMaxima: input start ] */ c(x); e(x); s(x); a(x); b(x); v(x); /* [wxMaxima: input end ] */ /* Maxima can't load/batch files which end with a comment! */ "Created with wxMaxima"$Results
Maximum SolubilityUsing the equation log_10 P = c+eE+sS+aA+bB+vV, the maximum solubility of compounds with known Abraham descriptors in any water-ethanol system can be predicted. The following code uses Python 2.7.6 to calculate maximum and minimum solubilities for a list of compounds;it also produces plots for each compound. The program uses sympy to find where the derivative of equation equals 0 and then substitutes those values into the original equation to find the max/min. It requires non-standard libraries including joblib, numpy, sympy, and matplotlib. The //x**2is required for sympy// to properly parse exponents. The program uses the descriptors file with ";" as delimiters. [[file:Compounds is an updated version of ONSAD001 which includes the max/min solubilities for each compound using the 3rd order polynomial; it does not include duplicates or any compounds with unknown Abraham descriptors.
The following calculates the max/min of the compounds in ONSAD001 using 3rd, 4th, or 5th order polyfit. It also creates a graph of each compound and it's solubility. At the end it will create a historgrams showing the frequency of the maximums and then the minimums.
The maximum solubilities were calculated for Hydrocotisone (0.761910624963585), Benzamide (0.805242993382671), Valdecoxib (0.906636072114208), Antipyrine (0.655966697948519), Aminopyrine (0.847692071360370) using the cubic polynomials. A complete list of the compounds can be found in the previously mentioned Compounds file.
Riboflavin is interesting because its maximum occurs at .2.
2212 compounds were used in making the following histogram. The histogram is not very surprising, but it quickly shows that most of the compounds have maximum solubilities near 1. There were approximately 150 compounds that have maximum solubilities of 0. The histogram on the left includes all compounds in 100 bins, and the histogram on the right only includes compounds in 25 bins.
4th Order Polynomial Fit
5th Order Polynomial Fit
The 5th order polynomials fit produces similar results, but the histogram creating by rerunning the histogram is somewhat interesting because the slight curve in the 3rd order is gone and more compounds have maximums near 1 . The histogram can be reproduced by running the parallelized.py first, and then histogram.py.
The following links contain all the plots of the 3rd, 4th, and 5th order polynomials. On a technical note, they are .tar.gz; so they are compressed and archived. Some older versions of 7-Zip or WinRAR require that you extract them twice - first to the .tar file and then to the unarchived folder.
Differences between poly(3), poly(4), and poly(5) fits for the floating order zero coefficient and when we force the order zero coefficient to be zero.
The following plots shows the histograms of where in the volume ratio scale we have maximum solubility for ~ 2000 compounds. This plots was obtained by the python code compiled on a windows machine using Spyder for interface. First row is for floating, second row is for force zero. The columns increase with the polynomial order.
Histograms of maximums and minimums for drug solubility versus polynomial fit
Comparison of Drug solubility maximums position versus polynomial fit
Can be observed on the following graph, the difference on prediction for maximum solubility as different polynomial order of approximation of the experimental data was used on the model. This difference on prediction can be as high of 15% for caffeine and less than 5% for 1,4 Dioxane, Rifapentine, Benzamide and Adenin. Negligible difference for Ammonia, Betulin, Hydrocortisone and Cimetidine.
Comparison with the Jouyban-Acree Model
The Jouyban-Acree model can be used to approximate the solubility of a compound in the water-ethanol system at various temperatures:
where x is the volume fraction of ethanol and T is the temperature in degrees Kelvin. Using:
We obtain...
Expanding in terms of x gives:
Comparison with our model, where in this case we have let c= 0:
We see that the Jouyban-Acree model restricted to T=298.15 K ....
The coefficients are function of the temperature, but they are not very sensitive, assuming the coefficients for logP_E stay the same as we increase the temperature, which I doubt that is the case. :P
References
1. Abraham MH and Acree Jr. WE: Partition coefficients and solubilities of compounds in the water-ethanol system. J Solution Chem. doi: 10.1007/s10953-011-9719-x